Historic,  Archive  Document 

Do  not  assume  content  reflects  current 
scientific  knowledge,  policies,  or  practices. 


\rLnDjda\ 
A/J-r?  I 


v.  United  States 
}!:  Department  of 
Agriculture 


'V 


S' 


National 
Agricultural 
Statistics 
Service 


Research  Division 


An  Assessment  of  the 
Appropriate  Size  Measure 
ai  for  Probability  Proportional 
to  Size  Path  Sampling 
of  Apple  Trees 


SRB  Research  Report 
Number  SRB-95-01 


April  1995 

DarlalDeJong 
Mike'tFleming 
William  C.Lbarig 
Charles  R{  Perry 


C/>— 

CO 

X>cl 

i  T» 


Vl 


ol 
c _ 


\ 

CO 

> 

cn 

>o 

CO 


oC 
vo  01 
oO 

•!Z> 

.O 

VO 

> 

VO 

— c 


An  Assessment  of  the  Appropriate  Size  Measure  for  Probability  Proportional  to  Size  Path 
Sampling  of  Apple  Trees,  by  Darla  DeJong,  Nationwide  Insurance  Company,  Columbus,  Ohio1. 
Mike  Fleming,  William  C.  Iwig,  and  Charles  R.  Perry  Survey  Research  Branch,  Research  Division, 
National  Agricultural  Statistics  Service,  United  States  Department  of  Agriculture,  Washington,  DC 
20250-2000.  March  1995.  Research  Report  No.  SRB-95-01. 


ABSTRACT 


This  study  evaluates  the  optimal  branch  size  measure  for  use  in  probability  proportional  to  size 
(PPS)  sampling  of  apple  tree  branches  to  estimate  the  number  of  apples  on  a  tree.  The  analysis  is 
based  on  data  collected  in  the  1991  Apple  Objective  Yield  Pilot  Study  conducted  in  Washington. 
Current  NASS  Objective  Yield  procedures  for  sampling  orange,  tart  cherry,  and  other  fruit  trees 
select  branches  for  fruit  counts  using  probability  proportional  to  size  procedures  where  the  size 
variable  is  the  cross  sectional  area  (CSA)  of  the  branches.  Analysis  indicates  that  the  CSA  is  in  fact 
the  appropriate  size  measure  to  use  in  PPS  sampling  of  an  apple  tree,  applied  in  a  multiple  stage 
random  path  approach,  regardless  of  the  tree's  variety,  rootstock,  age,  or  geographic  region.  PPS 
sampling  of  branches  based  on  CSA  also  provides  smaller  sampling  variances  than  simple  random 
sampling  (SRS)  of  branches.  A  cost  benefit  analysis  needs  to  be  conducted  to  provide  final 
recommendations  on  an  optimal  within  tree  sampling  approach  for  apples. 

KEY  WORDS 

Apple  Objective  Yield;  PPS  Sampling;  Fruit  Count  Estimation;  Random  Path  Sampling. 


This  paper  was  prepared  for  limited  distribution  to  the  research  community'  outside  the 
U.  S.  Department  of  Agriculture.  The  views  expressed  herein  are  not  necessarily  those  of 


NASS  or  USDA. 


The  authors  wish  to  thank  Jim  Cox  of  the  Washington  Agricultural  Statistics  Service  for  making  the 
arrangements  with  the  participating  apple  growers,  providing  logistical  support,  editing,  and 
assisting  in  the  analysis  of  the  data. 


^arla  DeJong  was  on  the  staff  of  the  Ohio  Research  Unit  at  the  time  of  the  study  and 
preliminary  analysis. 


i 


TABLE  OF  CONTENTS 


SUMMARY . iii 

INTRODUCTION . 1 

TERMINOLOGY  . 1 

DATA  . 3 

ANALYSIS  METHODS . 5 

RESULTS  . 8 

CONCLUSIONS . 9 

RECOMMENDATIONS . 10 

REFERENCES  . 11 

APPENDIX  . 12 


ii 


SUMMARY 


This  study  evaluates  branch  size  measures  of  the  form  Jt;.  =  r  Y  where  r=radius  of  branch  i  and 

Y^O  for  use  in  probability  proportional  to  size  (PPS)  sampling  of  apple  tree  branches  to  estimate 
the  number  of  apples  on  a  tree.  Analysis  is  based  on  data  collected  in  the  1991  Apple  Objective 
Yield  Pilot  Study  conducted  in  Washington.  A  total  of  5 1  trees  were  used  in  this  pilot  study  and 
they  were  completely  mapped  and  enumerated,  which  entailed  the  following  steps:  the  cross 
sectional  area  (CSA)  of  each  branch  including  the  stem  of  the  tree  was  measured;  each  branch 
was  labeled  with  an  identification  number;  a  schematic  sketch  was  made  of  the  branching 
structure  of  each  tree  for  later  use  as  a  check  on  the  accuracy  of  the  data.  Current  NASS  objective 
yield  within  tree  sampling  procedures  for  orange,  tart  cherry,  and  other  fruit  trees  select  branches 
for  fruit  counts  using  probability  proportional  to  size  procedures  where  the  size  variable  is  the 
cross  sectional  area  of  the  branches.  The  PPS  sampling  procedures  are  applied  in  a  multiple  stage 
random  path  approach,  where  a  new  branch  is  selected  at  each  successive  fork  as  long  as  two  or 
more  branches  at  a  fork  are  larger  than  an  established  cutoff  CSA.  In  order  to  see  whether  the 
PPS  sampling  method  is  significantly  affected  by  variety,  rootstock,  age,  or  region,  the  trees  were 
grouped  accordingly  and  the  CV's  were  calculated.  Analysis  indicates  that  the  CSA  is  in  fact  the 
appropriate  size  measure  to  use  in  PPS  sampling  of  an  apple  tree  regardless  of  variety,  rootstock, 
age,  or  geographic  region.  PPS  sampling  of  branches  based  on  CSA  provided  smaller  sampling 
variances  than  simple  random  sampling  (SRS)  of  branches  regardless  of  how  the  trees  were 
grouped.  However,  PPS  sampling  requires  additional  time  for  measuring  and  selecting  branches 
at  each  stage.  A  cost  benefit  analysis  needs  to  be  conducted  to  determine  the  optimal  within  tree 
sampling  procedure.  The  cost  analysis  would  provide  recommendations  regarding: 

1)  PPS  sampling  vs.  simple  random  sampling 

2)  the  optimal  cutoff  CSA 

3)  the  number  of  branches  to  sample  at  each  stage. 


INTRODUCTION 

Presently,  apple  production  forecasts  are 
made  by  the  Washington  Agricultural 
Statistics  Service  by  means  of  non¬ 
probability  surveys  and  administrative  data. 
As  a  result,  the  July  forecast  has  ranged 
from  as  high  as  22%  above  to  as  low  as  32% 
below  the  final  estimate.  National 
Agricultural  Statistics  Service  (NASS)  State 
Statistical  Offices  (SSO)  in  Florida, 
California,  Oregon,  and  Michigan  currently 
conduct  fruit  and  nut  tree  objective  yield 
surveys  based  on  probability  survey 
methods. 

To  determine  optimal  survey  methods  for 
apples,  a  pilot  study  was  conducted  during 
September  and  October  1991.  The 
Washington  State  Statistical  Office  selected 
51  trees  in  the  Yakima,  Wenatchee,  and 
Columbia  Basin  regions.  The  sample 
consisted  of  many  varieties,  rootstocks,  and 
ages  of  apple  trees.  Each  tree  was 
completely  mapped  and  enumerated.  The 
objective  of  the  pilot  study  was  to  evaluate 
within  tree  sampling  procedures  relative  to 
the  expected  coefficient  of  variation  (CV) 
and  determine  whether  certain 
characteristics  such  as  rootstock,  variety, 
age,  or  geographic  region  significantly  affect 
the  efficiency  of  the  sampling  process. 
Specifically,  this  initial  study  evaluates  the 
optimal  branch  size  measure  to  use  in 
probability  proportional  to  size  sampling. 

TERMINOLOGY 

Probability  proportional  to  size  (PPS) 
sampling  has  been  used  by  NASS  in  several 
forms  to  estimate  the  number  of  fruit  on 
orange,  tart  cherry,  hazelnut,  and  other  fruit 
trees.  Typically,  the  size  variable  that  has 


been  used  is  the  cross  sectional  area  (CSA) 
of  the  branch,  which  is  proportional  to  the 

radius  squared  (  CSA  =  Tir2).  Other  size 
measures  could  also  be  used.  Branches  are 
selected  proportional  to  their  size  beginning 
at  the  stem  of  the  tree  until  the  enumerator 
reaches  the  outermost  branches  or  until  the 
CSA  is  less  than  a  pre-described  cutoff. 

At  each  stage  of  PPS  path  sampling,  one 
branch  is  selected  at  random  according  to  its 
relative  size  to  the  other  branches  at  that 
fork  to  which  it  is  attached.  This  selection 
process,  which  begins  at  the  stem  and 
proceeds  along  a  path  to  the  terminal  branch 
of  a  tree,  consists  of  multiple  stages  where 
each  stage  corresponds  to  a  fork  in  the  path. 
With  each  successive  stage,  the  CSA's 
become  smaller.  The  process  of  selecting 
branches  ends  when  an  enumerator  reaches 
that  outer  limb  whose  CSA  still  exceeds  a 
certain  prescribed  cutoff  CSA  but  where 
every  succeeding  limb  has  a  CSA  less  than 
the  cutoff  CSA.  Thus,  the  number  of  stages 
in  a  multiple  stage  process  is  dependent  on 
the  cutoff  CSA:  the  larger  the  cutoff  CSA, 
the  fewer  the  number  of  stages.  That  last 
limb  in  the  selection  process  which  meets 
this  cutoff  criterion  is  called  a  terminal 
branch.  A  primary  branch  is  a  limb  which 
exceeds  the  cutoff  CSA,  is  attached  to  the 
stem  of  the  tree,  and  supports  at  least  two 
terminal  branches.  Those  intermediate 
branches  which  connect  a  primary  branch 
with  a  terminal  branch  may  or  may  not  have 
fruit.  If  there  is  fruit  on  an  intermediate 
branch,  it  is  called  path  fruit. 

The  method  of  accommodating  the  path  fruit 
in  the  estimation  process  will  be  discussed 
later.  However,  in  simplest  terms,  if  the 
intermediate  branches  have  no  fruit,  then  the 


1 


estimate  of  the  quantity  of  fruit  on  a  tree 


{tree  load)  is 


,  where  y  is  the  number 


of  fruit  on  the  terminal  branch  and  all  others 
that  it  supports  and  ni  is  the  probability  of 


selected  at  the  second  fork  would  be  counted 
regardless  of  the  size  of  the  succeeding 
branches.  This  approach  was  evaluated  as 
an  option  that  would  simplify  data 
collection,  since  it  involves  less  within  tree 
sampling.  For  any  particular  tree,  it  is  the 
same  as  multiple  stage  sampling  when  the 


Schematic  Drawing  of  the  Redchief  Enumerated  in  Table  1. 


selecting  that  terminal  branch.  For  each 
branch  selection  scheme,  there  is  a  different 


We  will  look  at  the  probability  proportional 
to  size  and  simple  random  sampling  branch 
selection  schemes.  Two  stage  sampling  is 
the  same  as  the  multiple  stage  sampling 
except  that  the  process  does  not  go  beyond 
the  second  fork  along  the  path. 
Consequently,  all  apples  on  the  branch 


cutoff  CSA  is  large  enough  that  there  is  no 
more  sampling  of  branches  after  the  second 
fork. 

Simple  random  sampling  (SRS)  is  a  method 
of  sampling  where  the  probability  of 
selecting  a  branch  at  a  fork  is  uniformly 
distributed.  SRS  can  be  viewed  as  PPS 
sampling  where  the  size  measure  is  the 
radius  to  the  zero  power.  Then  the  size 
measure  is  the  same  for  all  branches  and  the 


2 


selection  reverts  to  random  sampling.  It  has 
the  attractive  feature  that  it  is  simpler  to  use 
and  to  teach  than  the  PPS  method. 

Jessen  [2]  developed  the  multiple  stage 
method  of  PPS  sampling  based  on  CSA  in 
the  mid-1950's  for  use  in  the  Orange 
Objective  Yield  Survey.  It  is  the  same 
method  that  is  still  being  used  by  NASS.  A 
comparison  of  various  sampling  plans  for 
estimating  the  tree  load  of  apple  trees  is 
discussed  in  Houseman  [1].  Of  the  eleven 
plans  that  he  considered,  the  PPS  method  of 
the  kind  developed  by  Jessen  and  evaluated 
in  this  pilot  study  was  the  one  that 
Houseman  deemed  best  because  it  had  a 
relatively  small  coefficient  of  variation  (CV) 
and  required  less  labor  than  the  next  best 
alternative  (which  involved  identifying  all 
terminal  branches  on  each  tree).  However, 
Houseman  did  not  specifically  evaluate 
alternative  size  measures  or  evaluate  the 
approach  over  a  wide  range  of  conditions. 
This  study  evaluates  those  issues  and  also 
evaluates  their  application  in  two  stage 
sampling. 

DATA 

The  trees  which  were  used  in  this  pilot  study 
were  completely  mapped  and  enumerated 
(see  descriptive  information  in  Table  A1  of 
the  Appendix).  That  entailed  the  following: 
the  CSA  of  each  branch  including  the  stem 
of  the  tree  was  measured;  each  branch  was 
labeled  with  an  identification  number;  a 
schematic  sketch  was  made  of  the  branching 
structure  of  each  tree  for  later  use  as  a  check 
on  the  accuracy  of  the  data. 

However,  before  the  analysis  began,  two 
problems  with  the  data  had  to  be  resolved. 


1 .  A  smaller  CSA  was  sometimes 
reported  for  a  parent  branch  than  a 
subsequent  branch.  It  is  reasonable  to 
assume  that  the  branches  become  smaller  the 
further  one  goes  down  a  path  though  the 
opposite  might  occur  if  an  irregularity  in  the 
shape  of  a  branches  affects  an  enumerator's 
measurement.  To  make  the  size  of  the 
branches  decrease  along  a  path,  a  correction 
was  made  to  the  inconsistent  branch  to  make 
it  arbitrarily  0. 1  in2  smaller  than  its  parent. 
There  were  9  out  of  2,094  branches  that  had 
to  be  corrected  this  way. 

2.  Missing  data  occurred  when  some 
terminal  branches  were  beyond  the  reach  of 
an  enumerator.  In  this  case,  the  average 
CSA  between  the  parent  and  the  cutoff  CSA 
of  0.5  in2  was  used  as  the  imputed  value. 

A  cutoff  CSA  of  0.5  in2  was  used  in 
designating  terminal  branches.  A  schematic 
diagram  of  a  seven  year  old  Redchief  apple 
tree  was  shown  previously.  It  illustrates  the 
naming  convention  that  was  used  to  identify 
the  branches.  The  concept  of  depth 
facilitated  matters  when  writing  the 
computer  programs;  it  is  defined  as  the 
number  of  digits  in  the  identification  number 
with  the  proviso  that  the  depth  of  the  stem  is 
zero.  It  can  be  viewed  as  a  measure  of  how 
far  removed  a  branch  is  from  the  stem. 

To  illustrate  what  the  raw  data  looked  like, 
Table  1  contains  the  data  for  the  Redchief 
apple  tree  whose  schematic  drawing  is 
shown  in  Figure  1. 

Since  the  cutoff  CSA  determines  the 
smallest  branch  which  an  enumerator  may 
select,  a  small  cutoff  CSA  would  mean  that 
a  large  number  of  branches  will  be  sampled 
but  with  fewer  apples  per  terminal  branch, 


3 


Table  1 


Raw  Data  for  a  Mapped  Tree 


Redchief 

Age:  7  Years  Total  Fruit:  203 

Branch 

Identification 

Number 

Depth 

Number  of  Apples 
on  Branch 

Number  of  Apples 
Supported  by 
Branch 

Cross  Sectional  Area 

Type  of  Branch  * 

0 

0 

2 

203 

7.6  in2 

S 

1 

1 

19 

19 

1.5 

T 

2 

1 

20 

20 

1.7 

T 

3 

1 

24 

24 

2.4 

T 

4 

1 

28 

28 

2.2 

T 

5 

1 

9 

9 

1.0 

T 

6 

1 

5 

5 

1.7 

T 

7 

1 

9 

96 

4.2 

P 

71 

2 

11 

11 

1.3 

T 

72 

2 

2 

76 

3.8 

I 

721 

3 

10 

10 

0.8 

T 

722 

3 

10 

10 

0.8 

T 

723 

3 

0 

0 

1.0 

T 

724 

3 

6 

54 

2.3 

I 

7241 

4 

2 

2 

1.2 

T 

7242 

4 

46 

46 

1.9 

T 

*  S=  Stem  P  =  Primary 

whereas  a  large  cutoff  CSA  would  mean 
fewer  branches  sampled  but  more  apples  per 
terminal  branch.  The  number  of  apples  that 
an  enumerator  can  be  expected  to  count 
depends  on  the  cutoff  CSA  and  therefore  is 
related  to  within  sampling  cost. 

To  evaluate  the  effect  of  the  cutoff  CSA  on 
the  estimate,  all  mappings  of  the  5 1  trees 
were  modified  and  re-mapped  as  if  a 
different  cutoff  CSA  had  been  used  in  the 
field. 


I  =  Intermediate  T = T erminal 


The  original  cutoff  CSA  that  was  employed 
for  the  pilot  study  was  0.5  in2.  The 
procedure  for  modifying  the  mapping  of  a 
tree  involved  a  computer  program  which 
sorted  the  branches  by  CSA  in  ascending 
order  and  then  kept  all  branches  which  had  a 
CSA  greater  than  or  equal  to  a  new  but 
larger  cutoff  CSA.  For  each  new  cutoff 
CSA,  the  branching  structure  of  the  tree's 
crown  had  to  be  reconfigured,  in  order  to 
make  it  conform  with  how  the  tree  would 


4 


have  been  mapped  with  the  new  cutoff  CSA. 
With  a  larger  cutoff  CSA,  there  are  fewer 
limbs  that  could  be  branches,  so  that  the 
resulting  reconfigured  schematic  of  the 
crown  would  appear  as  if  it  had  collapsed  to 
something  smaller.  With  each  successive 
collapse  of  the  tree  by  the  computer 
corresponding  to  a  larger  cutoff  CSA,  the 
system  of  branching  became  simpler  until  at 
the  end  of  the  process  the  map  of  the  fully 
collapsed  tree  consisted  of  only  the  stem  and 
two  branches.  Special  attention  focused  on 
the  process  of  reconfiguring  the  map  to 
ensure  that  after  a  collapse  the  new  crown 
consisted  of  intermediate  branches  which 
split  at  least  twice  and  that  the  assignment  of 
apples  in  the  new  branching  structure  was 
correct. 

ANALYSIS  METHODS 

The  general  form  of  the  multiple  stage 
estimator  of  the  tree  load  can  be  written  as: 


For  the  Redchief  data  shown  in  Table  1,  the 
path,  k,  consisting  of  the  branches,  i: 
{0,7,72,724,7242}  would  have  the 
corresponding  apple  count, yk  :  {2,9,2,6,46}. 

The  basis  of  the  PPS  method  is  selecting  one 
branch  at  a  fork  according  to  its  relative  size 
with  respect  to  the  other  branches  at  that 
fork.  Size  variables  that  historically  have 
been  used  for  within  tree  sampling  are  all 
functions  of  the  limb  radii  at  the  branch 
points.  For  example,  PPS  sampling  using 
limb  cross  sectional  area  as  the  size  variable 
is  equivalent  to  PPS  sampling  using  limb 
radius  squared  as  the  size  variable.  Simple 
random  sampling  is  equivalent  to  PPS 
sampling  using  limb  radius  to  the  zero 
power.  For  this  study,  we  considered  size 
variables  of  the  form: 

x.-ry  (2) 

where  r  is  the  radius  of  branch  i  and  yzO. 


S  =  £  (— )  (1) 

i  ft  u 


where  nk  is  the  probability  of  selecting 
branch  i  along  the  k^  path  by  using  either 


PPS  or  SRS  methods  and  yk  is  the  number 


of  apples  on  the  same  branch  and  along  the 
same  path  to  terminal  branch  t.  Each  path 
ends  at  a  unique  terminal  branch;  therefore, 
for  each  tree  there  is  a  unique  one-to-one 
correspondence  between  the  inventory  of 
terminal  branches  and  the  inventory  of 
paths.  If  there  is  no  path  fruit,  then  (1) 


y 

reduces  to  the  same  —  as  referred  to  in  the 

ft. 


terminology  section. 


The  probability  of  selecting  branch  i  at  a 
fork  is: 


branches  m 
at  fork 


Next  the  probability  of  selecting  some 
branch  along  a  path  is  the  product  of  the  p's 
at  each  fork  along  the  way  from  the  stem  of 
the  tree  to  the  branch  of  concem,that  is,  for 
branch  i  along  the  k ^  path: 
i 

n  *, = n  Pj  w 

7=0 

When  j  =  0,  the  path  section  referred  to  is 
the  part  of  the  tree  between  the  ground  and 
below  the  first  stage  branches.  When  j=l, 


5 


Table  2 

Probabilities  of  Selection  Along  a  Path  for  the  Redchief 


PPS  to  CSA 

SRS 

Branch  i 

Pj 

nk 

Kl 

Pj 

Kl 

0 

1 

1 

1 

1 

7 

4  2 

-.2857 

.2857 

1 

1 

1.5  + 1.7+2.4+2.2  + 1.0+ 1.7 +4.2 

7 

7 

72 

3,8  =.7451 

.2129 

1 

1 

1 .3+3.8 

2 

14 

724 

2  3 

=  .4694 

.0999 

1 

1 

.8  +  .8+ 1.0+2.3 

4 

56 

7242 

1  9 

y  =.6129 

.0612 

1 

1 

1.2  + 1.9 

2 

112 

the  path  section  referred  to  is  the  part  of  the 
tree  on  path  k  above  the  first  stage  branch 
and  below  the  second  stage  branches  and  so 
on. 


Let  Tt  ^  denote  the  probability  of  selecting 

the  terminal  branch  on  path  k,  then  the 
expected  value  of  the  multiple  stage 
estimator,  y,  is: 

£[v]=  E  «*/(*)  (5) 

paths  k 

where  y^  denotes  the  realization  of  y 

when  path  k  is  selected.  By  using  induction 
to  evaluate  the  summation,  the  expected 
value  of  y(k)  is  exactly  equal  to  the  number 

of  apples  on  the  tree.  The  two  stage 
estimator  is  just  a  special  case  of  the 
multiple  stage  one.  Therefore,  the  PPS  and 
SRS  multiple  stage  or  two  stage  estimators 


are  unbiased. 


Using  the  Redchief  tree  as  in  the  previous 

example,  Table  2  shows  the  p.  and  71, 

J  i 

along  path  k  for  PPS  based  on  CSA  and 
SRS.  Substituting  the  values  of  nk  from 

Table  2  for  PPS  sampling  into  (1),  we  get  an 
estimate  for  the  number  of  apples  on  the 
Redchief  in  our  example: 

„  -2+9  +  2  +  6+46 
ypps  1  .2857  .2129  .0999  .0612 

=  854 


The  actual  tree  load  is  203,  so  by  using  our 
chosen  path  the  estimate  exceeds  the  actual 
value  by  651  apples.  Similarly  an  estimate 
of  the  tree  load  via  SRS  is: 


2  9 

yats = j  +  —  + 

7 


14  56 


46 

1 

112 


=5,581 


6 


Again  the  estimate  exceeds  the  actual 
number  of  apples.  However,  the  estimates 
from  paths  leading  to  other  terminal 
branches  would  be  low.  As  mentioned 
previously,  overall  the  estimator  is  unbiased. 

The  size  variable  of  the  form  rY  that  provides 
the  optimal  PPS  multiple  stage  estimator  is 
obtained  by  minimizing  the  variance  in  (6) 
over  the  general  class  of  size  variables  given 
in  (2). 

var<J)=  y  (6) 

paths  k 

The  variance  of  y  is  a  function  of  y  defined 
in  (2)  and  the  cutoff  CSA.  Since  every  tree 
had  been  completely  enumerated,  the 
variance  can  be  computed  for  any  given  y 
and  cutoff  CSA. 

The  analysis  for  this  pilot  study  consisted  of 
grouping  trees  according  to  types  of 
rootstock,  variety,  geographic  region,  and 
age  and  then  comparing  the  CV's  of  the 
estimates  for  PPS  and  SRS  methods  for  both 
multiple  stage  and  two  stage  approaches.  In 
order  to  find  a  size  variable  that  could  be 
used  over  all  51  trees  or  over  subgroups  of 
them,  a  general  CV  was  computed  for  each 
group,  cutoff  CSA,  and  sampling  procedure. 


total  apples  in  group 


The  sum  of  squared  errors  (SSE)  of  the 
estimated  apple  load  of  a  tree  were  summed 
at  regularly  spaced  cutoff  CSA's  over 
groupings  of  trees  to  get  the  CV  at  a 
particular  y  and  cutoff  for  that  grouping.  The 
sse  for  a  particular  tree  is  defined  as: 


ssel=T,(yKKl-Y)2 

k 

where  =  estimated  number  of  apples  on 
tree  i  from  path  k  and  F.=observed  number 

of  apples  on  tree  i.  Three  dimensional 
graphs  of  this  general  CV  fitted  against  the 
radius  power  (y)  and  the  cutoff  CSA  over  all 
5 1  trees  were  reviewed  to  determine  the 
optimal  y  for  multiple  and  two  stage 
approaches.  In  addition,  the  trees  were 
grouped  according  to  variety,  rootstock,  age, 
and  region.  The  three  dimensional  graphs 
were  reviewed  to  evaluate  the  effect  of  the 
different  factors. 

Groupings  of  trees  were  as  follows: 

1.  Varieties. 

a.  Group  1  consisted  of  all 
varieties  from  the  Red 
Delicious  strain. 

b.  Group  2  consisted  of  all 
the  other  varieties. 

2.  Rootstock. 

a.  Standard. 

b.  Semi-dwarf. 

3.  Age. 

a.  1-10  years. 

b.  11-20  years. 

c.  Over  20  years. 

4.  Region. 

a.  Yakima. 

b.  Columbia  Basin 

c.  Wenatchee 


7 


RESULTS 

Figures  1  and  2  are  three  dimensional  graphs 
showing  the  general  CV  (Equation  7)  as  a 
function  of  the  cutoff  CSA  and  the  radius 
power  (y),  based  on  data  from  all  51  trees  in 
the  study.  The  figures  indicate  that  for  both 
the  multiple  stage  and  two  stage  approaches, 
a  value  for  y  of  2  corresponds  to  a  minimum 
in  the  CV,  regardless  of  the  cutoff  CSA. 
Figure  3  is  a  cross  section  of  the  surface 
shown  in  Figure  1  along  the  smallest  cutoff 
CSA,  0.5  in2  and  parallel  to  the  y  axis.  It 
clearly  shows  that  using  a  y  near  2 
corresponds  to  the  minimum  CV.  An 
optimal  y  of  2  implies  that  the  CSA 

(=tu  r  2)  is  the  optimal  size  variable  for 
probability  proportional  to  size  sampling  of 
apple  trees.  This  finding  is  also  consistent 
for  all  of  the  groupings  of  trees  evaluated 
in  the  study,  as  indicated  by  Figures  Al- 
A 10  in  the  Appendix.  This  means  that  the 
PPS  method  based  on  CSA  as  lessen 
proposed  and  used  for  the  Orange 
Objective  Yield  Survey  is  appropriate  for 
within  tree  sampling  of  apples  trees. 

The  graphs  also  indicate  that  simple 


random  sampling  (SRS),  which  corresponds 
to  y  =0,  produces  a  higher  CV  than  for  the 
PPS  to  CSA  method.  The  reduction  in 
CV's  due  to  PPS  sampling,  as  compared  to 
SRS,  is  dependent  on  the  cutoff  CSA.  For 
a  cutoff  CSA  of  0.5  in2,  analysis  indicates 
the  PPS  CV  is  about  1/3  of  the  SRS  CV. 

Figures  1  and  2  also  indicate  that  the  two 
stage  method  tends  to  produce  lower  CV's 
than  the  multiple  stage  method  over  all 
possible  cutoff  CSA's.  This  is  more 
clearly  shown  in  Figure  4,  which  is  a  cross 
section  along  y  =  2  from  Figures  1  and  2. 
For  cutoff  CSA's  in  the  1  to  2  in2  range, 
the  two  stage  CV's  are  approximately  20% 
less  than  the  multiple  stage  CV's. 

In  the  two  stage  process,  the  enumerator 
terminates  the  branch  selection  process 
after  the  second  fork  along  the  path. 
Branches  at  that  point  are  terminal  branches 
and  will  generally  have  a  CSA  much  larger 
than  the  cutoff  CSA.  Two  stage  sampling 
is  really  multiple  stage  sampling  at  the  first 
two  forks.  In  the  multiple  stage  approach, 
as  long  as  two  or  more  branches  at  each 
fork  are  greater  than  the  cutoff  CSA, 
branches  continue  to  be  sampled  at  each 


8 


Figure  3:  CV  vs  7  at  Cutoff  CSA=0.5  in^ 

CV 


stage.  Consequently,  more  apples  per  tree 
are  counted  with  the  two  stage  approach  for 
a  comparable  cutoff  CSA  than  in  a  multiple 
stage  approach,  since  all  apples  on  all 
branches  beyond  the  second  fork  are 
counted  regardless  of  the  branch  size. 


to  CV  and  total  cost.  (All  analyses  in  this 
report  were  based  on  sampling  one  branch 
at  each  stage,  but  multiple  branches  could 
be  sampled  depending  on  the  cost 
efficiency.) 


For  cutoff  CSA' s  in  the  1  to  2  in2  range, 
Figure  5  indicates  the  two  stage  approach 
involves  counting  20-30  more  apples  per 
tree.  Consequently,  the  two  stage  approach 
provides  a  lower  CV  but  the  cost  is 
probably  higher  than  a  comparable  multiple 
stage  approach. 

Relative  per  unit  cost  data  will  be  required 
to  find  the  optimal  sampling  technique.  For 
the  two  stage  technique  the  cost  includes 
the  time  involved  for:  identifying, 
measuring,  and  sampling  the  primaries; 
identifying,  measuring,  and  sampling 
second  stage  branches;  and  counting  the 
fruit  on  the  selected  branch.  The  cost  for 
the  multiple  stage  approach  does  include 
the  measuring  of  additional  branches,  but 
involves  counting  fewer  apples.  A  cost 
analysis  would  evaluate  the  optimal  cutoff 
for  each  approach  and  the  optimal  number 
of  branches  to  sample  at  each  stage  relative 


CONCLUSIONS 

This  study  evaluates  branch  size  measures 
of  the  form  Jt .  =  r  Y  where  r  =  radius  of 
branch  i  and  for  use  in  probability 


Figure  5:  Average  Number  of  Apples  at  Cutoff  CSA 


9 


proportional  to  size  (PPS)  sampling  of 
apple  tree  branches  to  estimate  the  number 
of  apples  on  a  tree.  Graphical  analysis 
demonstrates  that  a  size  measure  of  r2  will 
produce  the  lowest  CV's  regardless  of  the 
tree's  variety,  rootstock,  age,  or 
geographic  region.  This  size  measure 
produces  the  lowest  CV's  for  both  two 
stage  and  multiple  stage  random  path 
sampling.  Other  NASS  fruit  tree  surveys 
typically  use  the  branch  cross  sectional  area 
(CSA)  as  the  size  measure  for  PPS 
sampling.  Since  r2  is  proportional  to  the 

CSA  (CSA=nr2),  this  study  also 
confirms  that  the  CSA  is  an  optimal  size 
measure,  in  relation  to  CV,  for  PPS 
sampling.  PPS  sampling  based  on  a  size 
measure  of  r°  is  the  same  as  simple  random 
sampling  (SRS).  Consequently,  this 
analysis  shows  that  PPS  sampling  based  on 
CSA  produces  lower  CV's  than  SRS. 
However,  PPS  sampling  does  require 
measuring  branches  at  each  stage  where 
SRS  does  not  require  any  measurements,  so 
there  is  additional  cost  involved  with  PPS 
sampling.  A  thorough  cost  analysis  would 
determine  the  cost  efficiency  of  each 
approach. 

The  graphical  analysis  also  indicates  that 
the  two  stage  random  sampling  approach 
involves  counting  20-30  more  apples  per 
tree,  on  average,  than  the  multiple  stage 
approach  for  cutoff  CSA's  in  the  1  to  2  in2 
range.  The  resulting  two  stage  CV's,  using 
PPS  sampling  based  on  CSA,  are 
approximately  20%  less  than  the  multiple 
stage  CV's.  Since  a  two  stage  approach  is 
really  a  multiple  stage  approach  with  a 
relatively  large  cutoff  CSA,  cost  analysis 
would  first  focus  on  an  optimal  cutoff  CSA 
for  a  multiple  stage  application.  If  the 


optimal  cutoff  CSA  was  4  in2  or  more,  then 
possibly  the  simple  two  stage  approach 
would  be  attractive. 

RECOMMENDATIONS 

The  next  step  in  developing  an  Apple 
Objective  Yield  program  is  to  conduct  a 
cost  efficiency  analysis  to  determine  the 
optimal  within  tree  sampling  approach.  The 
objective  of  such  a  cost  analysis  should  be 
to  evaluate  the  trade-offs  between  costs  and 
CV's.  Within  tree  sampling  costs  are 
dependent  on  the  amount  of  time  involved 
in  measuring  and  sampling  branches  at  each 
stage  and  in  the  amount  of  time  required  to 
count  the  apples.  The  cost  analysis  would 
provide  recommendations  regarding: 

1)  PPS  sampling  vs.  simple 
random  sampling 

2)  the  optimal  cutoff  CSA 

3)  the  number  of  branches  to 
sample  at  each  stage. 

A  simple  cost  model  could  be  developed 
based  on  "expert"  estimates  of  the  time 
involved  to  measure  and  sample  branches  at 
each  stage  and  to  count  the  apples  on  each 
terminal  branch.  A  small  field  study  could 
be  conducted  by  the  Ohio  Applications 
Research  Section  to  help  develop  or  verify 
a  cost  model.  The  model  could  be  applied 
to  the  1991  pilot  study  data  for  a  cost 
efficiency  analysis. 


10 


REFERENCES 


[1]  Houseman,  Earl  E.  (1978). 
"Comparative  Efficiency  of  Sampling 
Plans  (Illustration  -  Apple  Trees)". 
Economics,  Statistics,  and  Cooperative 
Service.  U.S.  Department  of  Agriculture. 
Washington,  D.  C.. 

[2]  lessen,  Raymond  J.  (1955). 
"Determining  the  fruit  count  on  a  tree  by 
randomized  branch  sampling".  Biometrics , 
11,  99-109. 

[3]  Warren,  Fred  B.  and  William  H. 
Wigton  (1971).  "Sampling  for  Objective 
Yields  of  Apples  and  Peaches".  Statistical 
Reporting  Service.  U.  S.  Department  of 
Agriculture.  Washington,  D.  C.. 

[4]  Warren,  Fred  B.(1973).  "Sampling 
for  Objective  Estimates  of  Apple  Yields. 
Statistical  Reporting  Service".  U.  S. 
Department  of  Agriculture.  Washington, 
D.  C.. 


11 


APPENDIX 


Table  Al:  Background  Information  on  the  Mapped  Trees 


Tree 

Age 

Radius 
of  Stem 
(mml 

Variety 

Rootstock 

11 

7 

39.51 

Redchief 

Semi -dwarf 

12 

7 

30.06 

Redchief 

Semi-dwarf 

21 

11 

81.07 

Golden  Delicious 

Standard 

22 

11 

85.98 

Golden  Delicious 

Standard 

31 

13 

68.73 

Newtown 

Semi-dwarf 

32 

13 

67.22 

Newtown 

Semi-dwarf 

41 

2 

26.81 

Gala 

Semi-dwarf 

42 

2 

20.27 

Gala 

Semi-dwarf 

51 

15 

55.50 

Redchief 

Semi-dwarf 

52 

15 

59.09 

Redchief 

Semi-dwarf 

61 

8 

39.25 

Granny  Smith 

Semi-dwarf 

62 

8 

35.10 

Granny  Smith 

Semi-dwarf 

81 

10 

90.07 

Early  Brite 

Semi-dwarf 

82 

10 

51.67 

Early  Brite 

Semi-dwarf 

91 

19 

77.17 

Oregonspur 

Semi-dwarf 

92 

19 

105.31 

Oregonspur 

Semi-dwarf 

121 

34 

219.68 

Golden  Delicious 

Standard 

122 

34 

176.68 

Golden  Delicious 

Standard 

131 

8 

45.32 

Redchief 

Semi-dwarf 

132 

8 

47.74 

Redchief 

Semi-dwarf 

151 

8 

36.25 

Redchief 

Semi -dwarf 

152 

8 

46.21 

Redchief 

Semi-dwarf 

161 

8 

58.03 

Grandspur 

Semi -dwarf 

162 

8 

59.95 

Grandspur 

Semi-dwarf 

171 

11 

54.00 

Oregonspur 

Semi-dwarf 

172 

11 

44.86 

Oregonspur 

Semi -dwarf 

181 

3 

62.46 

Fuji 

Semi-dwarf 

182 

3 

49.23 

Fuji 

Semi-dwarf 

191 

7 

61.47 

Redchief 

Semi -dwarf 

192 

7 

69.32 

Redchief 

Semi-dwarf 

201 

10 

70.93 

Golden  Delicious 

Semi -dwarf 

202 

10 

70.9 

Golden  Delicious 

Semi-dwarf 

211 

27 

106.57 

Bisbee 

Semi-dwarf 

212 

27 

87.17 

Bisbee 

Standard 

221 

11 

47.96 

Redchief 

Semi-dwarf 

222 

11 

59.26 

Redchief 

Semi-dwarf 

231 

23 

65.36 

Earlistripe 

Standard 

232 

23 

87.17 

Earlistripe 

Standard 

241 

14 

37.37 

Redchief 

Semi-dwarf 

242 

14 

52.85 

Redchief 

Semi-dwarf 

271 

27 

116.42 

Golden  Delicious 

Standard 

272 

27 

120.75 

Golden  Delicious 

Standard 

281 

8 

123.27 

Fuji 

Semi-dwarf 

282 

8 

125.75 

Fuji 

Semi-dwarf 

291 

18 

105.79 

Redspur 

Standard 

292 

18 

30.40 

Redspur 

Standard 

301 

26 

178.41 

Hiearly 

Standard 

311 

23 

130.56 

Hiearly 

Standard 

312 

23 

136.10 

Hiearly 

Standard 

321 

11 

42.27 

Redchief 

Semi-dwarf 

322 

11 

40.53 

Redchief 

Semi-dwarf 

12 


APPENDIX 


13 


APPENDIX 


14 


